Given that I was responsible for putting together the system
architecture for
Github when they
moved away from their previous services provider to
Anchor Systems, I keep an eye
on what's going on there even though I've gotten out of professional
IT.
Recently(ish), Ryan Tomayko blogged about some
experiments
in using Amazon CloudFront as a CDN for static assets. While
discussing as-yet unsolved performance issues, he quoted a tweet from
Shopify's CEO (who I would have thought would have known better):
Protip to make SSL fast: terminate in your load balancer, that makes SSL
session cookies actually work.
No,
no, NO -- a million times
NOOOOOO!
Load balancers are, by their very nature, performance chokepoints. You
can't horizontally scale your load balancer, because if you do (by, say,
putting another load balancer in front of it -- don't laugh, I've seen it
done) you've just re-introduced all the same problems that putting SSL on
the load balancer was supposed to accomplish (that is, getting SSL
session caching to work).
The only thing you accomplish by putting your SSL sessions onto the
load balancer is bring the moment your load balancer melts down
through excessive CPU usage
that much closer. I dunno,
maybe Shopify doesn't actually do that much traffic or something,
but I certainly wouldn't be signing up for that headache if it were
my infrastructure.
(Sidenote: yes, I know that you can get "SSL hardware accelerators",
that put the basic crypto operations into dedicated silicon, but
they're rarely cheap, and you can't horizontally scale those,
either. They're effectively a specialised and outrageously
expensive form of "scaling by bigger hardware" -- which is a valid
method of scaling, but one with fairly terminal limitations that
certainly wouldn't make it practical for Github).
The problem you're trying to solve by centralising SSL operations is
that SSL includes a mechanism whereby a client making a new TCP
connection can say "here's the crypto details from last time I
connected to you" and the server can just pull out all of the
per-connection data that would otherwise take a long time (several
TCP round trips and a smallish chunk of CPU time) to setup. This
can make SSL connections a lot snappier (in Ryan's opinion, "We
spend roughly as much time handshaking as we do generating the
average dynamic request."), so they're a definite win if you can do
it (and yes, you can do it).
Webservers don't typically do this caching by default, but setting
up a simple SSL session cache for a single
nginx
or
Apache
webserver is pretty trivial. Testing it is simple, too: just
install
gnutls-bin and run:
gnutls-cli -V -r HOSTNAME grep 'Session ID'
If all three Session IDs are the same, then you've got SSL session caching
running. (If you're wondering why I'm using
gnutls-cli rather
than
openssl s_client, it's because openssl is living in the
pre-IPv6 dark ages, and I've got IPv6-only services running I wanted to
test).
So that's great for your little single-server hosting setup, but
per-machine caches aren't much use when you're running behind a load
balancer, Github style, because each SSL-enabled TCP connection can
very well end up on a different server, and a per-machine cache
won't help you very much. (If you think the answer to this problem
is "session affinity", you're setting yourself up for a whole other
world of pain and suffering.)
Here's the kicker, though -- the SSL session is just a small, opaque
chunk of bytes (OpenSSL appears to store it internally as an ASN.1
string, but that's an irrelevant implementation detail). An SSL
session cache is just a mapping of the SSL session ID (a string) to
the SSL session data (another string). We know how to do "shared
key-value caches" -- we have the technology. There won't be too
many large-scale sites out there that aren't using memcached (or
something practically equivalent).
So, rather than stuff around trying to run a load balancer with
enough grunt to handle all your SSL communication for all time, you
can continue to horizontally scale the whole site with as many
backend webservers as you need. All that's needed is to teach your
SSL-using server to talk to a shared key-value cache.
Far be it for me to claim any sort of originality in this idea, though --
patches for Apache's mod_ssl to do this have been floating around for a
while, and according to
the
release notes, the 2.4 series will have it all built-in.
However, Github uses nginx, to great effect, and nobody's done the work
to put this feature into nginx (that I could find, anyway).
Until now.
I present, for the edification of all,
a series of patches to nginx
0.8.x to provide memcached-enabled SSL session caching. Porting
them to other versions of nginx hopefully won't be too arduous, and
if you ask nicely I might give it a shot. If you don't use
memcached (gasp!), my work should at least be a template to allow
you to hook into whatever system you prefer. Sharding the memcached
calls wouldn't be hard either, but I'm not a huge fan of that when
other options are available.
I had originally hoped to be able to say "it works at Github!" as a
testimonial, but for some reason it hasn't been deployed there yet
(go forth and pester!), so instead I'm just releasing it out there
for people to try out on a "well, I don't know that it
doesn't work" basis. If it breaks anything, it's not my
fault; feel free to supply patches if you want to fix anything I've
missed.